Data validation entails the assessment and verification of data quality, accuracy, and reliability, thereby supporting informed decision-making.
Within World Justice Project (WJP), we have identified three distinct processes where these data checks are utilized:
Currently, our focus lies on developing protocols for the analysis of results.
The data cleaning process involves utilizing data validation to ensure the reliability of observations.
In the case of WJP, specific variables are included and cross-referenced with external sources and the Rule of Law Index to ensure data consistency and reliability.
Challenges:
The estimation process involves data validation using two approaches: replication checks and data consistency assessment.
Replication checks: Every estimated data point is independently replicated by another data analyst to ensure accuracy and reliability.
Data consistency assessment: Observations that lead to significant changes are meticulously reviewed, especially in the case of expert surveys. Additionally, aggregate responses from population surveys are assessed to ensure data consistency.
Challenges:
After gathering all the results to be included, it is essential to conduct a comprehensive validation of these findings.
The variations among the results are carefully examined to detect any biases or inaccuracies in any given country.
The results are then compared with other sources to ensure their consistency and reliability.
Challenges:
The protocol primarily focuses on validating the results obtained, rather than directly validating the quality of the underlying data.
This protocol serves as a rigorous quantitative complement and a valuable tool for post-estimation validations.
These methodologies validate WJP Global Index results and reports for Latin America and the Caribbean.
Quantitative methods detect data changes and discrepancies with other sources.
They complement qualitative research but do not replace it.
Methodologies evaluate results, not data collection process.
Two types of methodologies: internal and external validations.
Internal validations: These checks provide confidence in the data quality, ensuring it is representative, consistent, and free from distortions or biases.
Outliers’ detection
Changes over time
External validations: These checks provide valuable insights into the validity and credibility of the project results, ensuring they are in line with established standards and corroborated by independent measures
Rankings and other indexes
Third party sources questions
Outliers represent data points that significantly deviate from most observations.
Detecting outliers during internal data validation ensures the dataset is representative, consistent, and free from anomalies, enabling more reliable and robust analysis.
The step by step approach to applying the method is here: Outliers’ detection.
Monitoring changes over time helps detect data anomalies and highlights events that may have caused significant fluctuations or alterations.
This monitoring will employ two approaches to identify changes over time: conducting t-tests and analyzing trends.
The step by step approach to applying the method is here: Changes over time.
This comparative analysis will help determine if countries’ results and ranking order based on their scores align with other sources that aim to measure similar concepts.
We will establish thresholds of 3%, 5%, and 10% above or below their respective positions.
To provide a more precise comparison between external and internal data, a set of comparable questions will be selected, and the results will be compared at the country level.
The objective is to obtain microdata for testing internal and external data consistency through a mean difference test. If microdata is unavailable, differences exceeding thresholds of 3%, 5%, and 10% will be highlighted.
WJP Rule of Law Index
Latin America and the Caribbean Reports
European Union Subnational Project
EU subnational project aims to generate indicators for assessing justice, governance, and the rule of law in 110 regions across 27 EU member states.
Three resources (QRQ, GPP, third-party sources) utilized to develop indicators for NUTS and country-level scores.
Comparing results with other sources is crucial for reliability and confidence assessment.
Data Analytics Unit implements a protocol integrating methodologies to validate and compare project outputs.
The protocol complements qualitative checks, ensuring accuracy, identifying inconsistencies, and instilling confidence in the obtained results.
At pillar level
At sub-pillar level
Insights report
Web platform to test the outcomes
WJP Rule of Law Index Team
Each pillar consists of over 300 indicators/questions, with the first pillar having the most indicators and the sixth pillar having the fewest.
The catalog does not include any indicators for sub-pillars 6.4-6.8, 7.5-7.6, and 8.6.
Sub-pillars 2.3, 7.4, and 8.2 have fewer than 10 indicators/questions available in the catalog.
Subnational questions account for less than 7% of the total questions across all pillars. Pillar 7 has the fewest number of questions, with only 1, while pillar 2 has the highest number with 24.
Completed development of the conceptual framework and questionnaire.
Almost final version of the catalog.
Identified a comprehensive list of potential sources for comparison for each pillar and sub-pillar.
Successfully implemented codes and step-by-step instructions for all internal tests.
Achieved significant progress in designing the desired outcomes, with examples of outcomes serving as initial reference points.
Identify new sources for sub-pillars with insufficient information to initiate external comparisons.
Develop data cleaning functions for the selected external sources.
Establish a threshold for data comparison.
Integrate qualitative checks into the quantitative analysis.
Begin designing the platform.
What level of analysis will be used for data checking? Will it be at the country level or the NUTS level?
Is the process of comparison with external sources necessary in the cleaning process? With which sources would it be compared?
If we choose to aggregate the scores by country, how should we determine the weighting?
Determining the selection of appropriate third-party sources for data estimation and assessment.
Identifying the specific stages of the process where data analytics can intervene to enhance the overall process.
Can the questions be cross-referenced with the specified NUTS regions?
If we decide to aggregate the scores by country, how should we determine the weighting of each component?
How should we address situations where external data for certain sub-pillars is unavailable for comparison?